Algorithms for bispectra: forecasting, optimal analysis, and simulation 
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We propose a factorizability ansatz for angular bispectra which permits fast algorithms for fore- 
casting, analysis, and simulation, yet is general enough to encompass many interesting CMB bis- 
pectra. We describe a suite of general algorithms which apply to any bispectrum which can be 
represented in factorizable form. First, we present algorithms for Fisher matrix forecasts and the 
related problem of optimizing the factorizable representation, giving a Fisher forecast for Planck as 
an example. We show that the CMB can give independent constraints on the amplitude of primordial 
bispectra of both local and equilateral shape as well as those created by secondary anisotropics. We 
also show that the ISW-lensing bispectrum should be detected by Planck and could bias estimates 
of the local type of non-Gaussianity if not properly accounted for. Second, we implement a bispec- 
trum estimator which is fully optimal in the presence of sky cuts and inhomogeneous noise, extends 
the generality of fast estimators which have been limited to a few specific forms of the bispectrum, 
and improves the running time of existing implementations by several orders of magnitude. Third, 
we give an algorithm for simulating random, weakly non-Gaussian maps with prescribed power 
spectrum and factorizable bispectrum. 
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I. INTRODUCTION 



Despite the rapid progress in observational cosmol- 
ogy in the last decade, there are very few observational 
probes that are able to constrain the first instants in the 
evolution of the Universe, when density perturbations 
were created. In addition to the shape of the primor- 
dial spectrum of perturbations and the amplitude of the 
gravitational wave background, any measurable depar- 
ture from pure Gaussianity in the statistics of the pri- 
mordial seeds would severely constrain the inflationary 
dynamics. The level of non-Gaussianity in the simplest, 
single field, inflationary models has now been robustly 
calculated and shown to be too small to be measured 
by upcoming Cosmic Microwave Background (CMB) ex- 
periments [ll, Ql- Various departures from the simplest 
scenario however are thought to produce observable sig- 
nals I, Hi, SB i [3- 

In fact it has become clear that the three point func- 
tion of the primordial fluctuations, the bispectrum, is the 
most promising statistic to probe the small departures 
from Gaussianity that could originate during inflation 
prj . Moreover the structure of the three point function 
contains precious information about the inflationary dy- 
namics For example as a result of causality, the 
three point function in models where only one field is 
effectively responsible for inflation have to satisfy strict 
consistency relations that constrain the configurational 
dependance of the bispectrum [l], Furthermore we 
have learned that the shape of the bispectrum of any 
primordial non-Gaussianity arising in inflationary mod- 
els neatly falls in two separate classes. The momentum 



space three point function in single field models is largest 
when the three momenta are of similar magnitude, while 
for multi-field models where curvature perturbations are 
created by a field different from the one that drives in- 
flation, the bispectrum is largest in the squeezed limit, 
when one of the momenta is much smaller than the other 
two. 

If we are to probe the primordial non-Gaussianities 
using the CMB we also need to consider the departures 
from Gaussianity produced by secondary anisotropies as 
well as residual foreground contamination {eg. [3, [H, 
Hill). In general the non-linear dynamics of Gravity 
and of any probe we wish to use will lead to some de- 
partures from Gaussianity which arc not of primordial 
origin. These additional sources on non-Gaussianity pro- 
duce bispectra with specific configurational dependance. 

It is clear then that if we want to hunt for possible de- 
partures from Gaussainity as well as constrain the evolu- 
tion of perturbations after decoupling through secondary 
anisotropies, we need to develop data analysis tools that 
will allow us to measure the bispectrum efficiently and 
can distinguish between different shapes of the momen- 
tum space bispectrum. The efficiency of the tools will be 
crucial as the expected level of non-Gaussianity is rather 
small so it will only be detectable in large surveys, with 
a large number of pixels. Developing these tools is the 
object of this paper. The effort is timely as many of the 
predicted signals are expected to be detectable by the 
upcoming Planck satellite. 

In spite of the promise offered by the three-point func- 
tion, and the variety of forms that have been calculated, 
there is currently a lack of general methods for connect- 
ing a predicted three-point function with data. The ba- 
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sic problem is that the most general three-point func- 
tion allowed by rotational invariance is described by its 
angular bispectrum Bi-^i^i^, an object with O(^max) de- 
grees of freedom. In this generality, algorithms tend to 
be prohibitively expensive; for example, evaluating an 
optimal bispectrum estimator has cost C(^fnax)- Com- 
putationally feasible algorithms are only known for a 
few special forms of the bispectrum |18l . ,19]. One 
might make an analogy with power spectrum estima- 
tion; many solutions have been proposed to the prob- 
lem of optimally estimat ing power spectra from data 
[IS [III, ia, il, [il [11, il [ii] , but given a prediction 
for the shape of the bispectrum which is to be estimated 
from data, relatively little is known. 

The purpose of this paper is to address this problem by 
proposing a "toolkit" of algorithms for forecasting, opti- 
mal estimation, and non-Gaussian simulation. These fast 
algorithms will apply to any bispectrum which can be 
represented in a factorizable form which will be defined 
below (Eq. ([3])). We will show that a wide range of 

previously calculated CMB bispectra can be represented 
in this form, thus giving wide scope to our methods. Our 
methods do not apply to a completely arbitrary bispec- 
trum Bi^i^i^ , but we believe this to be a necessary feature 
of any treatment which leads to fast algorithms. The fac- 
torizability criterion is a compromise in generality which 
is specific enough to enable practical computation, yet 
general enough to encompass a large number of interest- 
ing cases. 

Our first algorithm ( ijlVIIVI) attempts to "optimize" a 
bispectrum by reducing the size of its factorizable rep- 
resentation. We will see that this is closely related to 
computing a Fisher matrix forecast under the assump- 
tion of homogeneous noise. The optimization algorithm 
can be used as a preliminary step to speed up the other 
algorithms; we will also see examples where a bispectrum 
with an intractably large factorizable representation is 
optimized down to manageable size, thus giving addi- 
tional scope to our methods. As an application, we will 
present f i]VI|) a multi-bispectrum Fisher matrix forecast 
for Planck. 

Second, we give a general Monte Carlo framework for 
estimating bispectra from noisy data, in the presence of 
sky cuts and inhomogeneous noise f WIIfCVIIip . This 
generalizes the estimator proposed by Komatsu, Spergel, 
and Wandelt [l^ to arbitrary factorizable bispectra, and 
also improves it in the case of inhomogenous noise, by in- 
cluding the linear term in the estimator proposed in [l^ . 
We also present some code optimizations which dramati- 
cally improve existing implementations and make Monte 
Carlo estimation practical for Planck. 

Third, we give a simulation algorithm f ^IX[) which out- 
puts random non-Gaussian maps with arbitrary power 
spectrum and factorizable bispectrum. This greatly ex- 
tends the generality of existin g in ethods for simulating 
non-Gaussian fields [11, [s^, [Mil ■ 

Throughout this paper, we make the assumption of 
weak non-Gaussianity. On a technical level, this means 



that the covariance of the three-point function is well 
approximated by its Gaussian contribution. If this ap- 
proximation breaks down, then both the choice of op- 
timal estimator and the estimator variance can be af- 
fected. It might seem that the CMB is so close to Gaus- 
sian that this is never an issue; however, it has recently 
been shown [Tl| that it can be important for bispectra 
of "squeezed" shape, if a several-sigma detection can be 
made, even though the field is still very close to Gaus- 
sian. A second reason that the assumption of weak non- 
Gaussianity might break down is that the lensed CMB 
is non-Gaussian; e.g. in [s^, it is argued that this may 
degrade constraints on /atl by ~ 25% at Planck sensi- 
tivity. Both of these effects are outside the scope of this 
paper. 

II. NOTATION AND CONVENTIONS 

The angular CMB bispectrum Bg-^g^g^ is defined by 

{ai>^miai2m2ae3'ma) - i^iii2i3 \ ra^ ^2 j ' ^ ' 

where the quantity in parentheses is the Wigner 3j- 
symbol. This is the most general three-way expectation 
value which is invariant under rotations. Following Ko- 
matsu and Spergel [s^l , we define the reduced bispectrum 
bi^i2iz by 

r(2£i + l)(2£2 + l)(2£3 + l)V^' 
Bi,i2i. - Y 

Because the 3j-symbol on the right-hand side vanishes for 
triples (fi,^2,^3) such that -^£3) is odd, Eq. ^ 

only makes sense if Bi^i^i^ also vanishes for such triples. 
This condition is equivalent to parity invariance of the 
bispectrum, and will be satisfied for all bispectra consid- 
ered in this paper. 

III. FACTORIZABILITY 

A basic problem in the theory of bispectrum estimation 
is that there are so many degrees of freedom in the bis- 
pectrum Bi-^i^i^ that completely general methods, which 
make no assumption on the form of Bi-^i^i^, are computa- 
tionally prohibitive. For example, even with the unreal- 
istic assumption of homogeneous instrumental noise, the 
cost of evaluating an optimal estimator is 0(^f„ax) [35]- 
In this section, we propose a factorizability ansatz for 
the form of the bispectrum, and show that many CMB 
bispectra of theoretical interest can be written in fac- 
torizable form. Our approach is empirical; we simply 
collect and analyze interesting bispectra from the liter- 
ature. In subsequent sections, we will present fast al- 
gorithms which can be applied to factorizable bipsectra, 
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which will improve the 0{£^^^^^) cost and make calcula- 
tions tractable. 

Our ansatz is that the reduced bispectrum defined in 
Eq. ([2]) is a sum of terms factorizable in £i, £2, £3- 



- 1 E 4f + (3) 



where iVfact is not too large. In Eq. ([3]) and throughout 
the paper, + (symm.) stands for the sum of five addi- 
tional terms obtained by permuting £1, £2, £3. 



Finally, we mention a mathematically trivial but prac- 
tically important example: the bispectrum from residual 
point sources (assumed Poisson distributed) is given by 



b^^e e = constant 



(7) 



The value of will depend on the flux limit at which 
point sources can be detected and masked in the survey 
considered, and on the assumed flux and frequency dis- 
tribution of the sources. As a rough baseline, motivated 
by Hi, i^, we will take fcP"^^^^ = IQ-^ yitK^ at Planck 
sensitivity, corresponding to a flux limit ~ 0.2 Jy at 217 
GHz. 



A. CMB secondaries 

The first general class of bispectra which satisfy the 
factorizability condition (Eq. ([3])) are those which arise 
from three-way correlations between the primary CMB, 
lensed CMB, and secondary anisotropies [fl [Tsj. These 
are of a manifestly factorizable (with A^fact = 3) form, 

, ^i(^i + l)- ^2(^2 + 1) +^3(4 + 1) ^ . 



(symm.) 



(4) 



where depends on the secondary anisotropy which is 
considered. 

For example, with secondary anisotropy given by the 
integrated Sachs- Wolfe (ISW) effect, /^i is equal to Cj"^, 
the cross power spectrum between the unlensed CMB 
temperature and lensing potential. In this case, the bis- 
pectrum should be detectable by Planck [l^, [3^], and 
would provide a direct signature, internal to the CMB, 
of an evolving gravitational potential [s^l • We will refer 
to this as the ISW-lensing bispectrum and use it as a 
running example throughout the paper. Other examples 
of the general form in Eq. (jl]) have also been studied; 



e.g. the Rees-Sciama-lensing bispectrum [16[, and the 
SZ-lensing bispectrum [l^ [l3| . 

Another general class of CMB bispectra is given 
by three-way correlations between Ostriker-Vishniac 
anisotropies and secondary anisotropies [13, HI] . These 
bispectra are of the form: 



dr fiAr)giJr)+ (symm. 



(5) 



To make this factorizable, we replace the r integral by a 
finite quadrature: 



(symm.) (6) 



The bispectrum is then of the factorizable form (Eq. ([3])), 
where Nf^ct is the number of quadrature points needed 
to approximate the integral. This device, replacing an 
integral by a finite sum in order to satisfy the factoriz- 
ability condition, will be used frequently throughout the 
paper. 



B. Primordial non-Gaussianity 

Moving on, we next consider CMB bispectra which 
arise from primoridal non-Gaussianity, rather than sec- 
ondary anisotropies. The general relation between the 
angular CMB bispectrum and primordial three-point 
function can be described as follows. We assume that the 
primordial three-point function is invariant under trans- 
lations and rotations, so that three-way correlations in 
the Newtonian potential $ are given by 

($(ki)$(k2)$(k3)) = {27TfS^{ki+k2+k3)F{ki,k2,h) 

(8) 

where F depends only on the lengths fc^ = |ki|, as implied 
by the notation. In [40j|, it is shown that the reduced 
angular bispectrum is given by 



beiiits = I drr- 



l[^^nAMr)Al{h) 

xF{h,k2,k3) 



(9) 



where Aj, (k) denotes the transfer function between the 
CMB multipoles and the Newtonian potential: 



47r(i) 



d^k 

(27r)3 



Af(fc)$(k)r;„(k) (10) 



Now let us consider some specific examples of primor- 
dial non-Gaussianity. The simplest is the "local model" , 
in which the primordial potential satisfies: 

$(X) = <i>G(x) - /]vl(<i>G(x)^ - (<i>G(x)2)) (11) 

where is a constant and $g is Gaussian. In this 
model, the primordial bispectrum is 



F'°'^{k,,k2,ks)^f^l, ,4-„.,4-„. 

1,1 rCo 



A2 



+ (symm.) (12) 



where is a constant and P*(fc) = A$/fc'*~"-' is the 
primordial power spectrum. Substituting Eq. (I12p into 
Eq. ([9]), the angular CMB bispectrum is 



bi%is=fNl I drr'l3,,{r)l3,,{r)a,,{r)+ (symm.) 



(13) 
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where, following [34| . we have introduced the functions 



aeir) 



dkk^jt{kr)Aj{k) 



(14) 



Mr) = I dken{kr)Aj{k)(^-^}j (15) 

After replacing the r integral in Eq. by a finite 

quadrature, the local bispectrum is of factorizable form 
([3]), with -/Vfact equal to the number of quadrature points. 

This illustrates a general point: suppose that the pri- 
mordial bispectrum F{ki^k2,kz) is a sum of M terms 
each factorizable in fci, fc2, ^3. Then the resulting CMB 
bispectrum bg^g^g^ will be a sum of A'fact = -M-^quad terms 
each factorizable in £1, ^2, ^3, where A^quad is the number 
of quadrature points need to do the r integral in Eq. 

Our next example is taken from 4l|, in which a pri- 
mordial bispectrum of the form 



FS--(ki,k2,k3) 



/fr(ki,k2,k3) 



"-1 "-2 



-/fr(ki,k2,k3) (16) 



ki.k2 , 3(ki.k3)(k2-k3) 



^3 



is studied, arising from second-order gravitational evolu- 
tion after inflation 48] . Using the constraint (ki -I- k2 4 
ks) = from Eq. ([H]), we rewrite / 
form 



in factorizable 



fNT{ki,k2, 



1 A^j^ 3/^-1^/^2 



12 



^3 



2^3 



(17) 



The resulting CMB bispectrum ^fj^^jf^ will then be of 
factorizable form (Eq. with A^eact = 4A^quad, where 
-^quad is the number of quadrature points needed to do 
the r integral in Eq. ([5]). 

This example illustrates the ubiquity and power of the 
factorizability ansatz. In 'il'l, finding a computationally 
feasible method for computing Fisher matrix uncertain- 
ties for the gravitational bispectrum at Planck noise lev- 
els was left as an unsolved problem. After representing 
the bispectrum in factorizable form, we will find, using 
the algorithms to be presented in the rest of the paper, 
that in addition to computing the Fisher matrix rapidly, 
we can compute an optimal estimator and construct non- 
Gaussian simulated maps for this bispectrum. 

Next, we consider the "higher derivative model" from 
p^ . which arises from adding the higher derivative op- 
erator (V(/))'*/(8A'') to the inflationary Lagrangian 0, Si- 



Universe at the time the triangle of interest crosses the 
horizon during inflation. For equilateral configurations 
all three modes in the triangle cross the horizon at ap- 
proximately the same time. 

Following a standard convention, we have introduced 
a parameter /^"^ for the amplitude of the bispec- 
trum, normalized so that with all three momenta equal, 
F{k, k, k) = 6/^'ip(A$ /k^f. The value of is given by 
= (35/432)07 A-*, where </> is the velocity of the in- 
flaton. (The same bispectrum also arises, typically with 



a larger value of for DEI inflation [8|.) 

The factor l/(fci + fc2 + k^^f appears to ruin factoriz- 
ability; however, this disease can be cured by introducing 
a Schwinger parameter t and writing 



1 



(fci 



dt ie-*(*=i+'=2+'^'^) (19) 



Using Eqs. Q, one arrives at a CMB bispectrum 

of the form: 



Lhd 



_ 9 Yid 



(20) 



poo pOQ 

+^^T dtt drr^(^peAr,thi,{r,t)a,,{r,t) 



+Si^{r,t)-fe^{r,t)-fi^{r,t)] + (symm.) 



where we have defined: 
2 



ae{r, t) 
1i{r, t) 



dk k^ji{kr)Aj{k)e^ 



-tk 



(21) 



dkk'Mkr)Ai {k)e-''' ( =f 



dk k^je{kr)Aj{k)e 



-tk 



dk k^ji{kr)Aj{k)e 



-tk 



A$ 
A$ 



1/3 



2/3 



(This ordering was chosen so that, for t — 0, these re- 
duce to the functions ae{r), (3i{r) defined in Eq. pi)) 
and the functions je{r), 5i{r) defined in |ll|-) Note that 
we have assumed scale invariance throughout our treat- 
ment of the higher-derivative bispectrum. It is seen that 
the bispectrum is of factorizable form (Eq. ([3])), with 
A^fact = A^i -f- 2A^2, where A^i, A^2 are the numbers of 
quadrature points needed to do the single and double 
integrals in Eq. pp]) . 

Finally, following [l^, we introduce the "equilateral" 
bispectrum 



ki k2 



ki ^2 ^3 



(fci + k2 + ksY 



(18) 
-|- (symm.) 



This expression assumes scale invariance. In the general 
case, the amplitude of the three point function depends 
on the dynamics of the field and the expansion of the 



NL 



F^^(fci,A:2,fc3)-/ 
-2 



7,4— rig 7 4— ris 
1^1 '^2 



A^ 



(22) 



kt 



2(4-n,)/3^2(4-n,)/3j,2(4-n,)/3 



+6- 



A2 



(4-n,)/3^2(4-n,)/3j,4-«, 



(symm.) 
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In contrast to the other bispectra discussed so far, this 
bispectrum does not arise from a model; rather it is de- 
signed to approximate the higher-derivative bispectrum 
(Eq. (Uni)) using fewer factorizable terms. In 18], it was 
shown that the two bispectra are so highly correlated 
that it suffices to work with F^"^ for data analysis pur- 
poses. We will confirm this result in WII The equilat- 
eral bispectrum is manifestly of factorizable form, with 
-^fact = 3-/Vqiiad, where iVquad IS the number of quadrature 
points needed to do the r integral (Eq. 

As the name suggests, for the equilateral bispectrum 
(Eq. (HH)), most of the signal-to-noise is contributed by 
triples (i?i, ^2, ^3) for which the £'s are of comparable mag- 
nitude. In contrast, for the local bispectrum (Eq. (I12p ). 
the greatest contribution is from "squeezed" triangles in 
which £1 <C ^2,^3- This is reflected in the asymptotics of 
the 3D bispectra in the squeezed limit k2 — k^, ki — > 0. 
The leading behavior of the local bispectrum is 0{k^'^), 
whereas the higher-derivative (Eq. and equilateral 

bispectra have leading behavior 0{k^^) arising from a 
cancellation between terms. The gravitational bispec- 
trum (Eq. ([in])) has the same C'(fcf'^) behavior as the 
local bispectrum. 



IV. FISHER MATRIX 



A "brute force" harmonic-space approach is to sim- 
ply perform the sum over all triples (^1,^2,^3) given in 
Eq. ([23|l . evaluating the bispectra [Ba)i^i^i3 by straight- 
forward use of Eqs. (21), ([2]). The computational cost of 
this approach is ©(A^fact^max)- I'^ many cases, we have 
found that the harmonic-space approach gives reasonable 
performance and allows the Fisher matrix to be com- 
puted straightforwardly. 

A second approach is based on computing the Fisher 
matrix in position space rather than harmonic space. For 
notational simplicity we will present the method for the 
case of a single bispectrum (so that the Fisher matrix re- 
duces to a number F) but our method extends straight- 
forwardly to the multi-bispectrum case. Writing out the 
Fisher matrix, 



F = 



1 {Bt^i^i^Y 



(25) 



E 



(2^1 + 1) (2^2 + 1) (2^3 + 1) ( h h h 





we use the identity 



1447r 



(symm.) 



(26) 



Before discussing inhomogeneous noise, we first con- 
sider the simplest possible noise model for a survey: ho- 
mogeneous parameterized by noise power spectrum N^. 
Such a noise model can be used as an approximation 
when forecasting sensitivity to different bispectra. In this 
noise model, the Fisher matrix of bispectra Bi, B2, ■ ■ ■ is 
defined by 



"a/3 



dcf 1 

6 



E 



(Bo.) 



i{B 13)11 



£2(3 



(23) 



where Ce = + Ni is the total signal -I- noise power 
spectrum. (In Eq. (|23p . we have written the Fisher ma- 
trix for an all-sky survey; for partial sky coverage, one 
makes the approximation that the Fisher matrix scales 

as Fa/3(/sky) OC /sky -Fa/3 (1).) 

The bispectrum covariance obtained from the survey 
is given by the inverse Fisher matrix: 



Cov(B„,S0) = (F- 



(24) 



(In particular, the marginalized la error on bispectrum 

Ba is given by (Ba) = {F-^fJS while the error 

with the other bispectra fixed is given by crfixGd(5a) = 
(Fqq)"^/^.) Thus, the Fisher matrix gives a complete 
forecast for bispectrum sensitivity of a given survey, in- 
cluding cross-correlation information, under the simpli- 
fying assumption of homogeneous noise. 

Let us consider the problem of efficient evaluation of 
the Fisher matrix in Eq. ([^5]) , assuming that the bispec- 
tra under consideration satisfy the factorizability condi- 
tion from ijllll 



h i2 i3 





to write F in the form 



(27) 



(28) 



where we have defined 



dcf 2tt 



2 fl 



Cx(0xU)Cy(i)FO)Cz(0zO)+ (perm.) 



(29) 

where -|- (perm.) denotes the sum of five additional terms 
obtained by permuting {X'^^\Y^J\ Z'^^l} and 



d_cf^(2£+2)W 



(30) 



where Pi{z) denotes the Legendre polynomial. 

To turn this into an algorithm for computing F , we 
note that the integral in Eq. ([29]) can be done exactly, 
using Gauss-Legendre integration [42] with A^quad = 
L3Anax/2j + 1 quadrature points, since the integrand is 
a polynomial in z whose degree is < 3€max- We loop 
over quadrature points z, computing the value of each 
function C,xy{z) which appears, and accumulating the 
contribution to each Fij from point z, before moving on 
to the next quadrature point. This gives a position-space 
algorithm for Fisher matrix evaluation whose computa- 
tional cost is C'(A^tact^max)- a rough rule of thumb. 
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we have found that this position-space method is faster 
when TVfact < 2^i„ax, and the C'(A'^fact^max) harmonic- 
space method is faster when A^fact ^ 2£niax, but the con- 
stant which appears here wiU depend on the implemen- 
tation. 

We have introduced the matrix Fij as a device for com- 
puting the 1-by-l "matrix" F, by summing the entries as 
in Eq. (|28p. but we note that Fij has a direct interpre- 
tation as the iVfact-by-A'fact Fisher matrix of the individ- 
ual terms in the factorizable bispectrum (Eq. Q). This 
observation will be important for the optimization algo- 
rithm which we now present. 

V. OPTIMIZING iVtact 

From the preceding discussion, it may seem that our 
position-space method for Fisher matrix evaluation is of 
limited usefulness, providing significant speedup over the 
harmonic-space method only in the regime A'^fact ^ ^max- 
However, as we will see in this section, the position-space 
method also leads to a means of "optimizing" a bispec- 
trum written as a sum of many factorizable terms: 

h.i.e, = g ^ X^Y^^Z^+ (symm.) (31) 

i—l 

by approximating b^^i^i^ by a factorizable bispectrum 
with fewer terms. We present an algorithm which re- 
tains a subset (of size iVopt) of the original terms and 
chooses weights wi, . . . , WN^j^t such that the bispectrum 

b'i^i.is = 6 E ^^X^Y^^Z^+ (symm.) (32) 

i—l 

is a good approximation to b. (In Eq. (j32p and through- 
out this section, we have assumed for notational simplic- 
ity that the terms in the original bispectrum (Eq. PT|) ) 
have been reordered so that the terms to be retained are 
in positions 1, . . . , iVopt.) 

Generally speaking, it is only possible to optimize 
a bispectrum by exploiting redundancy in the factoriz- 
able representation, such as consecutive terms which are 
nearly equal, or terms which are small enough to be neg- 
ligible. For example, a randomly generated factorizable 
bispectrum cannot be optimized. The canonical exam- 
ple where optimization is successful is the case where the 
bispectrum is given exactly by an integral over conformal 
distance r as in Eq. In this case, the input bispec- 
trum (Eq. pip ) could be obtained by oversampling the 
integral using a large number A^fact of quadrature points 
in r. The output bispectrum (Eq. ([32|l ) would represent 
a more efficient quadrature, specifically tailored to the r 
dependence of the bispectrum under consideration, using 
a smaller number TVopt of points, with integration weights 
given by the Wi. 

For Fisher matrix forecasts, it is often unnecessary to 
optimize A^fact! as discussed in i jlVl the Fisher matrix 



can frequently be computed in harmonic space even if 
the number of factorizable terms is large. However, for 
the analysis and simulation algorithms which will be pre- 
sented in Wm^IXl we will see that optimizing A^fact as 
a preliminary step usually makes a large improvement in 
the cost. This is the main benefit of the optimization 
algorithm which wc now discuss. 



A. Optimization algorithm 

Let us first ask: in what sense is one bispectrum S^^^^fg 
a good approximation for another bispectrum Bg-^i^i.^? 
Our criterion is based on distinguishability; the approxi- 
mation is good if the Fisher distance 

^(^^^y^fl^ (5w3_|^ (33) 

ili2i3 

is small. Here, Ci is a signal -I- noise power spectrum 
characterizing the survey under consideration, which is 
required as an input to our optimization algorithm. We 
usually iterate our algorithm until F{B, B') is of order 
10^^ or smaller; this corresponds to an optimized bis- 
pectrum which cannot be distinguished from the original 
to better than O.OOlcr. (In a realistic survey, the noise 
will be inhomogeneous and hence not describable by a 
power spectrum, but because our termination criterion 
is so conservative, it suffices to use a noise power spec- 
trum which roughly models the survey.) 

As a first step toward an optimization algorithm, sup- 
pose that we have already chosen a subset of terms to 
retain, and want to choose optimal (in the sense that 
the Fisher distance F{B, B') is minimized) values for the 
weights Wi. In ijlVl we showed (Eq. ((29|) ) how to ef- 
ficiently calculate the A'fact-by-A'fact Fisher matrix Fij 
between the individual terms in the input bispectrum 
(Eq. (|3ipV If F is block decomposed into submatrices of 
size A'^opt and (A^fact - A^opt), 

then the Fisher distance is given by 
F{B,B') = ^(l-mO(i^oo).,(l-i«,) (35) 
+2^(1 - Wi){Foi)u + Y,iFii)iJ 

iJ I J 

Note that we use lower case to denote indices in the first 
blocks of the decomposition in Eq. ([M)) and upper case 
for the second block. The Fisher distance F{B^ B') is 
minimized by choosing 

(«^»)opt = 1 + E(^o"o'-P^oi),:j (36) 
J 
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and the value at the minimum is given by 



F{B, B')opt 



ij 



(37) 



Now that we have seen how to optimize the weights, 
we address the problem of choosing which terms in the 
original bispectrum (Eq. ((3T|l ) to retain. Mathematically, 
this corresponds to choosing a subset of terms such that 
F{B, B')opt (Eq. ([SZl)) is minimized. (Note that after set- 
ting the Wi to their optimal values, F{B, -B')opt still de- 
pends on the subset of terms which are retained, but this 
dependence is hidden in Eq. ()37p . which follows our con- 
vention of assuming that the terms have been permuted 
so that terms {1, . . . , iVopt} are retained.) Since exhaus- 
tive search of all subsets would be prohibitively slow, our 
approach is to use a greedy algorithm: we build up a sub- 
set of terms iteratively, in each iteration adding the term 
which results in the greatest improvement in F{B, -B')opt- 
The algorithm terminates when F{B, B')opt has reached 
an acceptably small value. 

The simplest implementation of this algorithm would 
evaluate F{B,B')opt from scratch, using Eq. ([37|) . for 
each candidate term in each iteration, which would lead 
to a running time of 0{N^p^^Nf^^^), in addition to the 
time needed to precompute the matrix Fij . This can be 
improved by two optimizations. First, in the n-th iter- 
ation of the algorithm, suppose we have chosen n terms 
to retain and want to choose the (n-l-l)-st. If the matrix 
F is decomposed into blocks of size n and (iVfact ~ n) 
as in Eq. ([M]) . then one can show that the change in 
F{B, B')opt if the I-th term is added is given by: 



AF{B,B%pt = 



(Fii - i^jj^i^oo^Foi)// 



(38) 



After computing the matrix {Fu — FQ-^^Ff^Q^ Fqi) , Eq. 
allows us to select the (n-l-l)-st term to be retained in 
time 0{Nf^^^). This optimization improves the cost from 
^^(^opt^fect) to 0(7VoptiV3,t); the hmiting step is re- 
computing the matrix {Fu — Fq-^^Fq^^^ Fqi) from scratch 
for each n. 

This brings us to the second optimization: instead of 
keeping the matrix F, in each iteration we keep the ma- 
trix G defined by: 



G 



dcf 



-Fn, 



-Fnr^Foi 



00 00 



(39) 



Note that the lower right block of G is the matrix needed 
to efficiently select the next term, as described in the pre- 
vious paragraph. The other blocks have been constructed 
so that it is possible to update G in 0{Nf^^^) time when 
advancing from the n-th iteration of the algorithm to the 
(n-l-l)-st iteration (rather than recomputing from scratch 
at cost 0{Nf^^^)). More precisely, assuming that terms 
have been permuted so that the new term to be retained 
is in the (n-l-l)-st position, the update rule is given as 



follows. If G is given by the block decomposition (into 
sizes n and (A'^fact — n)) 



G 



Gqq Gqi 

Gqi Gil 



Gqo 




Ao 




7 






vl 


Al 


Vl 


Ai, 



(40) 



in the n-th iteration, then it is given by 





I Goo 


- (l/7)woW(f 




Ao - {lh)vovJ 


G^ 




-'"oil 


-1/7 






Uo 


- (l/7)«it'(f 


-wi/7 


Ai - {lh)vivj 



(41) 

in the (n+ l)-st iteration. (Note that the middle "block" 
in Eqs. (|¥T|) has size 1, e.g. 7 is a number.) By keep- 
ing the matrix G, and using the update rule (Eq. ([1T|) ). 
the cost of the optimization algorithm is further improved 

to 0(7VoptA^Lt)- 

Putting this all together, our optimization algorithm 
can be summarized as follows. We initialize the G-matrix 

and initialize the 



to the matrix Fij defined in Eq. 
"score" F{B, B') to the sum of the entries of F. We then 
iterate the following loop (starting with 71 = 0): 

1 . In the 71-th iteration, we have already chosen a sub- 
set of n terms to retain and permuted the original 
terms so that these are in positions 1, . . . , n. We 
have stored the G-matrix defined by the block de- 
composition in Eq. (|40p . 

2. Choose the index / = n + 1, . . . , A'fact which maxi- 
mizes 



AF(B,B')opt= 



G/7 



(42) 



This represents the (n -I- l)-st term chosen to be 
retained. 



3. Update F{B, B')opt according to Eq. ([55]) . permute 
the terms so that the new term is in the {n + l)-st 
position, and update the G-matrix according to the 
rule in Eq. (gT]) . 

4. If F{B, B') has become acceptably small, then ter- 
minate, returning the subset of terms to be retained 
and the optimal weights 

(«^»)opt = l-^Ga (43) 
J 

Otherwise, continue to the (n + l)-st iteration. 

The total running time is 0(iVopt Affect)' which m prac- 
tice is usually less than the C(^max^f'ict) time needed to 
precompute the matrix Fij . 
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V 


Sfwhm 


At 


/sky 




WMAP3 


41 GHz 


28' 


460 mK' 


0.77 


1000 




61 GHz 


20' 


560 mK' 


0.77 


1000 




94 GHz 


12' 


680 iiYJ 


0.77 


1000 


Planck 


100 GHz 


9.2' 


51 mK' 


0.8 


2000 




143 GHz 


7.1' 


43 ii¥J 


0.8 


2000 




217 GHz 


5.0' 


65 ^iK' 


0.8 


2000 



TABLE I: Experimental parameters used when making Fisher 
forecasts for three-year WMAP and Planck, taken from the 
WMAP3 public data release and [4^ respectively. 




1.5x10" 



Conformal distance r (Mpc) 



FIG. 1: Contribution to B'^i^i^i^ as a function of conformal 
distance r, with dominant contribution from recombination 
(r ~ 14000 Mpc), for (^1,^2^) = (2,300,300), a typical 
squeezed triple with high signal-to-noise. 



B. Optimization examples 

We now give some examples of our algorithm for op- 
timizing A'^fact, using bispectra from jjllll We use noise 
power spectra which roughly approximate the three-year 
WMAP and Planck surveys, as given in Tab. IH (Note 
that, because the optimization algorithm is Fisher dis- 
tance based, a noise power spectrum is one of the inputs.) 

First consider the local bispectrum ^g^i^g^, which is 
an integral over conformal distance r (Eq. (|13p ). with 
one factorizable term for each quadrature point needed 
to do the integral. To get a feeling for how differ- 
ent values of r contribute, we show the integrand for 
(^1,4, h) = (2, 300, 300) in Fig.Hl (This particular triple 
was selected for high signal-to-noise; the local bispectrum 
is dominated by "squeezed" triples with l\ <C ^2,-^3-) 
This structure, showing a large contribution from recom- 
bination [r ^ 14000 Mpc), with secondary contributions 
from reionization [r - 10500) and ISW (r < 5000), 
is typical of bispectra which arise from primordial non- 
Gaussianity. 

To be conservative, we oversample the r integral us- 
ing a quadrature with 1200 points as shown in Tab. [TTl 



< r < 9500 
9500 < r < 11000 
11000 < r < 13800 
13800 < r < 14600 
14600 < r < 16000 
16000 < r < 50000 



A'^quad ~ 150, linearly-spaced 
Aquad ~ 300, linearly-spaced 
Aquad ~ 150, linearly-spaced 
Aquad = 400, linearly-spaced 
Aquad = 100, linearly-spaced 
Aquad = 100, log-spaced 



TABLE II: Quadrature in r used when computing bispectra 



in §V21 with a greater density of points near reionization 
(second row) and recombination (fourth row); units for r are 
Mpc. 





Afact 


Aopt 


Aopt 




(input) 


(WMAP3) 


(Planck) 


Point source (Eq. (0) 


1 


1 


1 


ISW-lensing (Eq. @) 


3 


3 


3 


Local (Eq. (fT2))) 


1200 


11 


21 


Equilateral (Eq. (f22)l'l 


3600 


24 


47 


Gravitational (Eq. ([TeJ) 


4800 


168 


255 


HD (Eq. (Ull)) 


80000 


33 


86 



TABLE HI: Number of terms Afact obtained for the point 
source, ISW-lensing, local, equilateral, gravitational, and 
higher derivative bispectra, after oversampling the integrals 
using the integration quadratures described in i]V Bl and num- 
ber of terms Aopt which are retained after using the optimiza- 
tion algorithm for WMAP3 and Planck noise levels. 



This quadrature was obtained empirically by increasing 
the sampling until each of the bispectra considered in 
i jllll had converged at the percent level, for several rep- 
resentative choices of (^1,^2, ^3)- Using this quadrature 
for the local bispectrum, we obtain a bispectrum with 
-^fact — 1200. After running the optimization algorithm, 
optimized bispectra with A^opt — H or 21 factorizable 
terms are obtained, for WMAP3 or Planck noise levels 
respectively. In this case, the optimization algorithm can 
be thought of as computing a more efficient quadrature 
in r by choosing both quadrature points and integration 
weights Wi. The resulting quadrature is optimized so that 
it results in a bispectrum which is indistinguishable at 
the given noise levels from the oversampled bispectrum, 
while using far fewer quadrature points. 

Repeating this procedure for the gravitational 
(Eq. (|16p) and equilateral (Eq. ([H])) bispectra, we ar- 



rive at the A'opt values given in the middle rows of 



Tab. mil We emphasize that in all cases, the input and 
output bispectra are instinguishable (to O.OOlti) at the 
given sensitivity levels, since the optimization algorithm 
is only allowed to terminate when the Fisher distance 

i^(Binput,Bopt) is < 10-6. 

As a final example, consider the higher-derivative bis- 
pectrum (Eq. (|20p ). In this case, the bispectrum is a 
double integral over conformal distance r and Schwinger 
parameter t. For the t integral, we use a uniform quadra- 
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FIG. 2: Distribution of factorizable terms in the (r, t) plane 
for the "optimized" higher-derivative bispectrum b^fi^i^. 



10-2 



ture in log(i) with three points per decade from t 
to t — 10®. With this quadrature, we find that the iden 
tity 



1 



(fci 



^3) = 



dt ie-*(*=i+'''2+'=^) (44) 



holds to 0.1%, throughout the range of wavenumbers 
10^^ ^ ^ 1 Mpc^"'^ where the photon transfer func- 
tion A^(fc) is appreciably different from zero (for ^,nax = 
2000). As explained in fcll Eq. (gl]) is the basis for writ- 
ing the higher-derivative bispectrum in factorizable form. 
For the r integral, we use the quadrature in Tab.HTl with 
one additional sublety mentioned for completeness: at 
large values of t, we have found that the inner integral 
over r contains contributions from large values of r. For 
this reason, we extend the log-spaced quadrature in the 
last row of Tab. [IT] to r^ax = (5 x lO*") using 400 addi- 
tional points. 

Combining this 25-point quadrature in t and 1600- 
point quadrature in r, and with 2 factorizable terms per 
quadrature point, we obtain an "oversampled" higher- 
derivative bispectrum with iVfact = 80000. It is infeasible 
to optimize this bispectrum directly since the cost of the 
optimization algorithm is O {Nf^^^i"^^^) . For this reason, 
we use a two-level optimization procedure, first separat- 
ing the factorizable terms into batches with ^ 1000 terms 
which are optimized separately, then combining the "op- 
timized batches" into one bispectrum which is optimized 
again. The final result is a bispectrum with A'fact = 33 
or 86, for WMAP3 or Planck respectively. For Planck, 
we show the distribution of terms in the (r, t) plane in 
Fig. [21 Even though the oversampled bispectrum con- 
tains terms throughout the ranges 10^^ < t < 10® and 
< r < (5 X 10®), the optimization algorithm finds that 
a much smaller range in (r, t) suffices to accurately ap- 
proximate the bispectum. 

In more detail, the two-level optimization procedure 
used for the higher-derivative bispectrum is given as fol- 



lows. The input bispectrum is split into N bispectra 

L^inputJ ■ 



JV 



Si 



Er(*) 
^input 

i=l 



(45) 



each of which is optimized separately, obtaining bispectra 
{-^opt} which satisfy: 

10-® 



47V2 



(46) 



These are combined for a final round of the optimization 
algorithm, obtaining an output bispectrum Bopt which 
satisfies: 



F B, 



opt' ^opt 



< 



10- 



(47) 



With the threshhold values given on the right-hand sides 
of Eqs. (Uni), (H7)) . it can be proved that the output bis- 
pectrum satisfies F(i3input, ^output) < 10~®, our stan- 
dard termination criterion. Thus the accuracy of the ap- 
proximation need not be degraded by use of the two-level 
procedure. 

An interesting and counterintuitive byproduct of the 
two- level procedure is that, even when the number of 
terms in -Binput is so large that direct computation of 
the 1-by-l Fisher matrix -F(i?input, ^input) is infeasible, it 
may be possible to obtain an optimized bispectrum i?opt 
whose Fisher distance to i?input is provably small. This 
increases the scope of the factorizability ansatz: even 
if the number of terms required to represent a bispec- 
trum in factorizable form appears to be intractably large, 
the optimization procedure may succeed in reducing to 
a more efficient representation, for which the algorithms 
described in this paper will be affordable. 

Let us conclude by emphasizing the sense in which the 
output bispectrum from the optimization algorithm ap- 
proximates the input bispectrum. The only requirement 
is that the two are experimentally indistinguishable to 
O.OOlcr. Intuitively, this means that they approximate 
each other at percent level in regimes which contribute 
the most signal-to-noise. In regimes where the input bis- 
pectrum is too small to contribute significant signal-to- 
noise, the output bispectrum is not guaranteed to resem- 
ble the input; it is only guaranteed also to be small. This 
is why, for example, our optimization algorithm tends 
to drop all contributions after reionization (r < 9500); 
these mainly contribute (via the ISW effect) to triples 
{£i, £2, (3) in which each ii is small, and the total contri- 
bution of such triples to the Fisher matrix is negligible. 
(In fact, such contributions could presumably be dropped 
from the outset, but our approach is to conservatively 
oversample the integrals and let the optimization algo- 
rithm determine which contributions are negligible.) 

Our focus will be on Fisher matrix forecasts and bis- 
pectrum estimation, for which this distinguishability- 
based criterion is ideal, since it permits extremely aggres- 
sive optimization of the bispectrum, as seen in Tab. IIIII 
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Pt. src. 


ISW 


Loc. 


Eq. 


Grav. 


HD 


Pt. src. 


(0.05) 


0.00 


0.00 


-0.01 


0.00 


-0.01 


ISW 


0.00 


(0.16) 


-0.24 


0.00 


0.25 


0.01 


Local 


0.00 


-0.24 


(6.3) 


0.25 


0.78 


0.28 


Equil. 


-0.01 


0.00 


0.25 


(66.9) 


0.36 


-0.98 


Grav. 


0.00 


0.25 


0.77 


0.36 


(33.4) 


0.28 


HD 


-0.01 


0.01 


0.28 


-0.98 


0.28 


(59.9) 



TABLE IV: Fisher matrix for Planck, using bispectra from 
Tab. mil and noise parameters from Tab. HI Off-diagonal en- 
tries are correlations; the diagonal parenthesized entries are 
la Fisher matrix errors {~F~^^^^) on the amplitude of each 
bispectrum, without marginalizing the others. 
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However, the caveat should be mentioned that the op- 
timized bispectrum cannot be used for all purposes; for 
example, for understanding the behavior of the bispec- 
trum in noise-dominated regimes. 



VI. FORECASTS FOR PLANCK 

Armed with the optimized bispectra in Tab. Illli it 
is straightforward to perform a Fisher matrix analysis 
for Planck, using noise parameters from Tab. U The 
result is shown in Tab. IIVI we obtain cr(/]^£) = 6.3, 
^(/^l) ~ ^^-^ ^ 6"' detection (corresponding to 
la error 0.16) of the ISW-lensing bispectrum. For the 
point source bispectrum, we have taken a reference value 
fjps ^ ^j^3 described in f|Tni and found a - 20a 
detection, but this figure should be taken very roughly 
since the value of b^^ is very sensitive to the flux limit 
which is assumed. 

One result from the Fisher matrix forecast is that the 
higher-derivative bispectrum (Eq. (fig]) ) is 98% correlated 
to the equilateral shape (Eq. (HH)). In practice, this 
means that it suffices to use the (simpler) equilateral form 
when analyzing data; the two bispectra cannot be distin- 
guished (at la) unless a 25a detection can also be made. 
This result agrees well with 18'], where a 0.98 correlation 
was also found. 

A second result is that the gravitational bispectrum 
(Eq. (|16p ) is well below the detectability threshhold for 
Planck. (We also find that it is 97% correlated to a lin- 
ear combination of the local and equilateral bispectra, 
but this is not relevant if its amplitude is too small to 
be detected.) One way to quantify this statement is by 
quoting an "effective" value of f^L for which the local 
bispectrum (Eq. (|12p ) has the same Fisher matrix er- 
ror as the gravitational bispectrum (whose amplitude is 
assumed fixed). Using the Fisher matrix we have com- 
puted, we obtain = 0.2. This value does not depend 
strongly on ^maxj as can be seen in Fig. [31 where we show 
the dependence of the Fisher matrix errors on ^max, as- 
suming Planck noise characteristics throughout. 

This result seems to disagree with [4l|, who found 



FIG. 3: Fisher matrix errors vs. ^max for the ISW-lensing, 
local, equilateral, and gravitational forms of the bispectrum, 
assuming Planck noise levels throughout. 



« 4 at £inax = 500 with a trend toward increasing 
fffj^ as £max increases. Since the two methods for cal- 
culating the Fisher matrix are so different, it is difficult 
to compare the calculations directly. However, the result 
that the gravitational bispectrum is "weaker" than the 
local bispectrum with = 1 can be seen intuitively 
from the 3D bispectra in Eqs. (|12|) . (fT6|) . One can prove 
that the ratio of the two bispectra satisfies 

fg-(fci,fe,fc3) ^ 13 
F'oc(,,,,„fc3) (48) 

for all values of (fci, k2,k^) which satisfy the triangle in- 
equality, and the ratio is close to zero for the "squeezed" 
configurations which contribute greatest signal-to-noise. 
For example, in the squeezed limit fc2 = fcs, /ci — > 0, the 
ratio approaches 1/6. 

Another conclusion from the Fisher matrix forecast is 
that the equilateral bispectrum seems more difficult to 
detect than the local bispectrum. This is in some sense 
just a matter of convention in the definition /at^, in the 
different cases. This has already been observed in the 
context of WMAP; e.g. in [H], the la errors a{f§l) = 34 
and a{f'^j^) = 147 were obtained from three- year WMAP 
data. Here we find (Fig. [3]) that this trend becomes some- 
what more pronounced with increasing ^max; for Planck 
(^max = 2000), the ratio cr(/^^)/a-(/]^£) has increased 
to 10.6. 

Finally, we find that correlations between the point 
source, ISW-lensing, local, and equilateral bispectra are 
small. In effect, the CMB gives independent constraints 
on these four bispectra which are not appreciably de- 
graded by marginalizing the other three. However, the 
correlation between the ISW-lensing and local bispec- 
tra is large enough that the presence of the former 
(a guaranteed 6a signal) contributes non-negigible bias 
(A/]^£ — 9.8) when estimating the latter. If a multi- 
bispectrum analysis with marginalization is performed. 
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FIG. 4: Contour plots of dF/d(log^max)d(^miii/^max), defined 
in Eq. (|49p . showing the contribution to the Fisher matrix 
error as a function of (i?min, ^max), for the ISW-lensing, local, 
equilateral, and gravitational forms of the bispectrum. 



then this bias will be subtracted without significantly de- 
grading cr(/]^£). A similar comment applies to the point 
source bispectrum; we have found that at the Planck ref- 
erence value {b^^ — 10~* mK'^), there is neghgible bias 
contributed to the other bispectra, but the point source 
bispectrum should be marginalized in practice since its 
value is quite uncertain. 

It is illuminating to show the contributions to the 
Fisher matrix from differently shaped triples (^1,^2,^3)- 
In Fig. m we show contour plots of the quantity 



dF 



dlog(fmax) d{imin/^ max J 



max 



{Be 



J' 



(49) 

The Jacobian (^^ax) has been included as a prefactor 
so that the Fisher matrix element F will be given by 
integrating over the variables {log(£inax), (^min/^max)} on 
the axes of the plot. The sharp feature seen in these plots 
at ^min/^max = (1/2) ariscs solely from the behavior of 
the Wigner 3j symbols, and would be present even if the 
reduced bispectrum bi-^i^i^ were simply proportional to 

(Q,Q,C,3)l/2. 

It is seen that the equilateral bispectrum receives most 
of its signal-to-noise from the "equilateral" regime (^min 
comparable to ^max), whereas the ISW-lensing and local 
bispectra receive highest contributions from the squeezed 
regime (^min ^ ^max), confirming the intuitive picture 
from the end of i jllll From this description it may seem 
surprising that the correlation between the ISW-lensing 
and local bispectra is so small (0.25). This is because 
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FIG. 5: Values of bij^^e^ts /{CciCi^Ce^Y^^ , showing oscilla- 
tions in the ISW-lensing bispectrum but not the local bispec- 
trum, plotted for £i — 10 and £3 = £2 + 6. This is a typical 
"squeezed" triangle which contributes high signal-to-noise in 
both cases. 



the ISW-lensing bispectrum contains oscillatory features 
which are not present in the local case (Fig. [5]). These are 
hidden in Fig. |4] and help orthogonalize the two bispec- 
tra. If one were to artificially replace bi-^i^i^ — > l^^i^a^aL 
then the correlation between the ISW-lensing and local 
bispectra would increase to 0.6. 



VII. OPTIMAL BISPECTRUM ESTIMATION 

In this section, we present a general framework for op- 
timal bispectrum estimation in the presence of sky cuts 
and inhomogeneous noise. In the context of power spec- 
trum estimation, a method similar to ours was proposed 
by Oh et al. [22| . and it is illuminating to first revisit 
their construction. 



A. Power spectrum estimation 

The power spectrum estimation problem can be stated 
as follows. One is interested in simultaneously estimating 
the amplitude of bandpowers Ci, . . . , Cn^^^^ from a map 
TO with noise covariance N. It can be shown that the 
optimal estimator for each bandpower a is given by 



(50) 



where C = S + N is the total signal -I- noise, to is the 
observed temperature map (assumed to have covariance 
C), and Fafs is the Fisher matrix 

Fa^p = ^TT[CaC-'CpC~'] . (51) 

A subtlety in Eqs. ([SO)) . ([CTjl is that, strictly speaking, 
the definition of the estimator depends (via the matrix 
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C~^) on the signal covariance S which one is trying to 
estimate in the first place. In practice, the estimator can 
be iterated until convergence is reached; it can be shown 
that the limiting value of S obtained in this way is equal 
to a maximum likelihood estimate [2(| . This subtlety will 
be ignored in this section where our purpose is merely 
to set the stage, in the more familiar context of power 
spectrum estimation, for the bispectrum estimator which 
follows. 

Evaluating Eqs. ([50]) . ([51]) appears infeasible for large 
maps owing to the 0{Np^^) matrix operations which ap- 
pear. However, in [23 |. the following method was pro- 
posed. Considering Eq. ([50)) first, one only needs to mul- 
tiply a single map by C~^, which can often be done af- 
fordably (and without needing to store the matrix C in 
dense form) using conjugate gradient inversion, although 
the details will depend on the experiment's noise model. 
Considering next Eq. (|5T|) , the trace can be written as as 
a Monte Carlo average: 



1 



a/3 = :^ (a^C ^CaC ^Cf^C ^a)^ 



(52) 



where the notation {■)a denotes an average taken over 
signal -f noise realizations a (i.e., a is a Gaussian ran- 
dom field with covariance C). If we continue to assume 
an affordable procedure for multipliying a map by C~^, 
Eq. ([nil) permits F^p to be computed by Monte Carlo. 
The estimator covariance is then given by 



(53) 



The matrix Fap, defined by Eq. (|5ip . is the Fisher ma- 
trix for the survey with noise covariance given by an ar- 
bitrary matrix N. For optimal estimators, the Fisher 
matrix gives both the normalization (Eq. (|50p) and the 
covariance (Eq. ([55)) ). 

This method for optimal power spectrum estimation 
in the presence of arbitrary noise covariance C has an 
analogue for bispectra, as we will now see. 



B. Bispectrum estimation 

Let us now consider the analagous problem of opti- 
mal estimation of the amplitude of a given bispectrum 
Bi-^i^i^ . The form of the optimal estimator has been con- 
structed previously and shown to contain both cubic 
and linear terms: 



oFs 



h l2 is 

mi m2 



(54) 



where F^ is a constant which normalizes the estimator to 
have unit response to Bf -^i^i^. (The factor 1/6 has been 



included for later convenience.) In order to translate the 
value of £ to a constraint on the bispectrum amplitude, 
one needs to know both the normalization Fg and the 
variance Var(£). 

The cubic term in the estimator can be thought of as 
a matched filter whose shape is given by the bispectrum 
Biiii^s- The linear term can improve the variance of the 
estimator for certain bispectra (more precisely, bispec- 
tra of "squeezed" shape). For example, better limits on 
are obtained from the one-year WMAP data using 
an estimator containing the linear term than from the 
three-year WMAP data without the linear term 18, "i^. 
We note that for the fully optimal estimator (Eq. ([54)) ). 
the data appears weighted by inverse signal + noise C~^, 
and so the variance of the estimator always improves as 
more modes are added to the data. This is in contrast 
to suboptimal methods, such as those used to analyze 
WMAP data to date [II, [H, H IH , where as the cut- 
off multipole ^max is increased, the variance eventually 
worsens as a result of the inhomogeneities in the noise 
and the sky cuts. 

We now introduce notation which will be used through- 
out the rest of the paper. Given a map a = define 



D ^ \ mi 



«2 •^Z 

m2 ma 



^l^'m-i (^t2va2 ^^3^3 



(55) 



We also consider the gradient of T[a] with respect to the 
input map: 



_ , dcf dT[a] 1 
VimT[a\ = „ = -B, 



e i2 \ , 

da*„^ - 2""^'-^ 1^ m m2 mg J "^^™^"^3™3 

(56) 

Note that T[a] is a scalar which is cubic in the input map 
a, whereas VT[a] is another map which is quadratic in 
a. 

The significance of T is that three quantities of inter- 
est can be written compactly as Monte Carlo averages 
involving T, VT. First, the estimator (Eq. ()54p ) can be 
rewritten: 



£[a] = ^(T[C-'a] 



(57) 



obtaining the linear term as a Monte Carlo average. Here, 
(•)a' denotes the Monte Carlo average taken over Gaus- 
sian realizations a' of signal -I- noise. Second, the nor- 
malization constant is given by: 

Fs = l{yi,rrnT[C-'a]CiX^,^^V,,^,T[C-'a])^ (58) 

Third, the estimator variance is given by 

YaT{£)^F^\ (59) 
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Eqs. (|54|) - ((59)) are derived in Appendix El Taken to- 
gether, they provide the basis for the foUowing Monte 
Carlo procedure, which is the main result of this section: 

1. In each Monte Carlo iteration, construct a random 
signal + noise realization a. 

2. Evaluate C'^a, VT[C-ia], and C-^{VT[C-^a\) 
(see below). 

3. Accumulate the contribution to the following 
Monte Carlo averages: 

(Vi,^,T[C~^a])^ (60) 

(v,,„,,T[C-ia]C,-J„^,^,„^V,,„,,r[C-ia])^ (61) 

At the end of the Monte Carlo loop, the linear term 
in the estimator (Eq. (|57p ). the estimator normalization 
(Eq. ([58]) ). and the estimator variance (Eq. ([59]) ) have 
been computed. For each of these three quantities, the 
Monte Carlos converge to a value which incorporates the 
full noise covariance (including the sky cut), in contrast 
to methods which assume an /sky scaling. A feature of 
the method is that the variance (Eq. ([55)1 ) includes the 
effect of the linear term in the estimator, even though 
the linear term is not precomputed. Both are computed 
in parallel by the same set of Monte Carlos. After the 
Monte Carlo loop, one can evaluate the estimator (in the 
form given by Eq. (j57p ). including linear term and nor- 
malization, on the observed CMB map m. 

In each Monte Carlo iteration, one VT evaluation and 
two multiplications by are required. For the first of 
these ingredients, evaluating VT, we will give a fast algo- 
rithm in the next section, assuming that the bispectrum 
satisfies the factorizability condition (Eq. ([3])). 

A method for the second ingredient, multiplying a map 
by {S -\- N)~^, will depend on the noise model of the 
experiment under consideration. In this paper, where 
our emphasis is on general algorithms which are not 
experiment-specific, we will not address this problem. 
However, let us emphasize that the experiment-specific 
challenge of finding an affordable method for {S + N)~^ 
multiplication is a necessary ingredient in the optimal es- 
timators of Eq. (|50p and Eq. ([51)1 . If this problem can 
be solved, then the general framework we have presented 
here will permit optimal bispectrum estimation. Other- 
wise, one must fall back on a suboptimal estimator, e.g. 
replacing C^^ in Eq. (j54p by a filter which approximates 
it. Since it may not be feasible to solve the {S + N)^^ 
problem for every dataset in which the three-point func- 
tion is studied, we treat this case in App. [Bl including 
a discussion of how the linear term which improves the 
estimator variance may be retained, even when full opti- 
mality is lost. 

We conclude this section by describing the generaliza- 
tion of our Monte Carlo procedure to joint estimation 



of multiple bispectra _Bi, • • • Denote the quantity 

T[a] (Eq. ([55]) ). evaluated using bispectrum Ba, by rQ,[a]. 
Then the optimal estimator for Ba , debiased to have zero 
mean response to Bp ((3 ^ a), is given by: 

(62) 

where Faf3 is the matrix 

The estimator covariance is given by 

CoYi£a,£p) = F~^. (64) 

With multiple bispectra, the Monte Carlo procedure is 
the same as described above. In each iteration, C~^a, 
VTa[C-'^a], and C-^VTa[C-^a] are computed, for a to- 
tal of n evaluations of VT and (n+ 1) multiplications by 

The matrix Fap defined in Eq. (|64p is the 7i-by-7i Fisher 
matrix between the bispectra Bi, ■ ■ ■ , _B„ , with the full 
noise properties of the survey incorporated via the covari- 
ance C. (In the case where the noise is homogeneous, Fa^ 
reduces to the Fisher matrix that was defined previously 
in Eq. ([^5]) ). In the estimator framework, Fafj arises 
as both the normalization (Eq. (HH)) and the covariance 
(Eq. (|64p ) of optimal bispectrum estimators. This is in 
complete analogy with optimal power spectrum estima- 
tion as described in WII Al 

C. Comparison with direct MC 

We have obtained the estimator covariance and nor- 
malization (which are equal for the case of the optimal es- 
timator in Eq. ([51)1 ) as a Monte Carlo average (Eq. ([55]) ). 
The reader may be wondering why a simpler Monte Carlo 
procedure was not used instead. For example, consider 
the variance of the cubic term T[C~^a] in the estimator. 
Following the treatment above, this would be computed 
via the Monte Carlo prescription 

Var(r[C"^a]) = (65) 

(Compare Eq. (|58p . which gives the variance when the 
linear term is included in addition to the cubic term.) 
Why not compute this more simply by using the "direct" 
Monte Carlo prescription 

Var(T[C-ia]) = {T[C~'^a]T[C~'^a])a (66) 

instead? 
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FIG. 6: Monte Carlo error after N simulations, compared for 
one instance of the "fast" Monte Carlo method (Eq. (|65|l ) 
and one instance of the "direct" method (Eq. (|66|l 'l. showing 
much faster convergence in the first case. The local form of the 
bispectrum (Eq. HSjl) was used, with an all-sky homogeneous 
survey with ^max ~ 500, noise level 1000 /^K-arcmin, and 
beam Sfwhm = 10 arcmin. 



We have found that the convergence of the first ex- 
pression (Eq. (|65p ) is much more rapid than the second 
(Eq. This is illustrated in Fig. [51 where we show the 

dependence of the error on the number of Monte Carlo it- 
erations, for both prescriptions. It is seen that the Monte 
Carlo framework we have given above converges much 
more quickly than "direct" Monte Carlo, requiring only 
^ 100 simulations to reach 1% accuracy. For this reason, 
the Monte Carlo framework presented above is preferable 
to direct Monte Carlo computation of the estimator vari- 
ance, even though the computational cost per iteration 
is doubled (since VT must be computed instead of T, 
and two multiplications are needed instead of one). 
Another benefit is that the linear term in the optimal 
estimator is computed in parallel. 

It may seem suprising that the convergence rate of the 
two Monte Carlo prescriptions is so different. One way 
to see this intuitively is to note that the first expres- 
sion (Eq. (|65p ) contains two fewer powers of the map 
a than the second expression (Eq. In effect, one 

factor of {aa^) has been replaced "in advance" by its 
Monte Carlo average C, thus accelerating convergence. 
(Note that the same phenomenon exists in the context of 
power spectrum estimation; if the Fisher matrix is com- 
puted by Monte Carlo trace evaluation as in Eq. ([5^ . the 
convergence rate will be much faster than estimating the 
covariance of the estimator in Eq. ([50)1 by direct Monte 
Carlo.) 



VIII. EVALUATION OF T, VT 

We have now described a general Monte Carlo frame- 
work for optimal bispectrum estimation in the presence 



of sky cuts and inhomogeneous noise, with two ingredi- 
ents deferred: a method for evaluating {T, VT}, and a 
method for multipying a map hy (S + N)^^ . In this sec- 
tion, we address the first of these. We present a fast algo- 
rithm which, given an input set of multipoles a = {aim} 
and bispectrum written in factorizable form 

i X^Y^Z^+ (symm.) (67) 
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i=l 



evaluates T[a] and VT[a]. 

If Xi is any ^-dependent quantity, define Xa(x) to be 
the position-space map obtained by applying the filter 
Xi to aim'- 



Xa{yi) = YXtaimyi>rn{'x) 



(68) 



The basis for our algorithm will be the following position- 
space expression for T[a]: 

= 6^7 dcos(0)d^x«(^^,^)r«(0,^)zW(^?,^) 

(69) 



obtained from Eq. ([67[) and the identity 



d C0S{6) dip Yi^mi {0, <p)Yt^m2 {&! VWisms V) 



(2£i-l-l)(24 + l)(24 + l) 



47r 



mi 7712 ma 



1/2 



il h h 





(70) 



We next observe that the integral in Eq. (|69p can be done 
exactly using Gauss-Legendre quadrature in cos(0) with 
TVe — [3£niax/2j + 1 points, and uniform quadrature in 
Lp with 'N^p — (S^niax + 1) poiuts. This is because each 
term in the integrand is a polynomial in cos(0) of degree 
< 3^maa;, multipfied by a factor e*™*^ with — S^max < 
w < 3£„iax- (There is a subtlety here: some terms in 
Eq. ([S5)) contain odd powers of \J\ — ^ for which Gauss- 
Legendre integration is not exact, but each such term has 
an odd value of m, and hence gives zero when integrated 
over (/3.) 

This observation permits the integral to be replaced by 
a finite sum without approximation: 

(71) 

where wb denotes the Gauss-Legendre weight at the 
quadrature point B. 

Our algorithm for evaluating T[a], in the form (|7ip . is 
given as follows. There is an outer loop over 9 in which 
the contribution of each isolatitude ring to the integral is 
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accumulated. Within this loop (i.e. for fixed 9), the first 
step is to fill the matrix 



(72) 



using upward recursion in £ to generate the spherical har- 
monics. Second, we evaluate the matrix 



(73) 





WMAP3 


Planck 




(^max = 1000) 


(^„>ax = 2000) 


Point source (Eq. (0) 
ISW-lensing (Eq. O) 

Local (Eq. (fT2))) 
Equilateral (Eq. (I22j 
Gravitational (Eq. ([T6|l'l 
HD (Eq. (HU) 


3 CPU-sec 
10 CPU-sec 
84 CPU-sec 
117 CPU-sec 
467 CPU-sec 
149 CPU-sec 


1 CPU-min 

2 CPU-min 
14 CPU-min 
23 CPU-min 
81 CPU-min 
38 CPU-min 



by taking an FFT along each column of M. Third, we 
evaluate 



(74) 



TABLE V: CPU time needed for one evaluation of T[a], for 
each of the optimized bispectra from Tab. IIIII Evaluating 
Vr[a] in addition to T[a] would double the running times 
shown. 



by matrix multiplication. After this step, the matrix en- 
try is equal to the quantity Xa \0,(p) defined in 
Eq. ([68|) . We compute matrices Yi^, Zi^ analagously, 
replacing in Eq. ^ by Z^'\ The fourth step 
is to accumulate the contribution of one isolatitude ring 
(in Eq. ^) to T[a] as follows: 



T[a] ^ T[a] 



TTWg 



(75) 



This completes the algorithm for evaluating T; we now 
describe extra steps needed to evaluate VT. The idea 
is to compute derivatives of T with respect to each of 
the quantities defined in Eqs. ([7^ - (|75|) . in reverse order. 
First, differentiating Eq. ([75)l . one computes 



dT 



lip 



TTWe 



(76) 



and likewise for (dT/dYi^) and {dT / dZi^) . Similarly, by 
differentiating Eqs. ([72| -(|74 p . one computes the following 
quantities in order: 



dT 



(0 



dT 



Y,. 



dT 
5Y~ 



Z. 



dT 
dZiu, 



dT 
dM^ 

dT 



E 



dT 



dN, 



dT 



dap 



dM, 



■im 



(77) 

(78) 
(79) 



The final result gives the contribution of one isolatitude 
ring to VimT = {dT/da*^^). This procedure computes 
{r[a], VT[a]} in twice the running time needed to com- 
pute T[a\ alone. 

The algorithm we have just presented was closely 
inspired by the position-space estimator of Komatsu, 
Spergel and Wandelt [l^ , in the context of the local bis- 
pectrum (Eq. (fT3|) ). Indeed, our algorithm can be viewed 
as both generalizing the KSW estimator to an arbitrary 
input bispectrum satisfying the factorizability condition. 



and permitting calculation of the gradient VT[a] in addi- 
tion to T\a] . (We have seen that Monte Carlo evaluations 
of VT are needed to compute the linear term in the esti- 
mator, the normalization Fg, and the variance Var(£) in 
the presence of sky cuts and inhomogeneous noise.) 

Additionally, we have presented the details of the algo- 
rithm in an optimized way which dramatically improves 
the running time, compared to existing implementations. 
For example, in [18| . a running time of 60 CPU- minutes 
is quoted to evaluate the cubic term T[a] for the local bis- 
pectrum (Eq. (|13p ). using a quadrature in the r integral 
with 260 points, at Imax = 335. With these parameters, 
our implementation evalues T[a\ in 27 CPU-seconds; after 
optimzing iVfact using the method of CVl this is further 
improved to 4 CPU-seconds. 

The main reason that our implementation is so fast is 
that the only steps with cost ©(A^fact^fnax) have been 
written as matrix multiplications in Eqs. ((74)) . (|77p . 
These can be evaluated extremely efficiently using an 
optimzed library such as BLAS. In existing implemen- 
tations, the same asymptotic cost is accrued by means of 
A^fact separate 0{£l^^^ spherical harmonic transforms, 
but the overall prefactor is much larger. A second, less 
important, optimization is that we have converted the 
integral (Eq. ([69])) to a sum (Eq. ((7T|) ) using Gauss- 
Legendre quadrature in cos{9) and uniform quadrature 
in rather than using a pixelization such as Healpix. 
In addition to giving an exact evaluation of Eq. (|55p . 
our "pixelization" is optimized to minimize the number 
of quadrature points needed, which translates to smaller 
matrix sizes in the rate-limiting steps. 

In Tab. |Vl we have shown timings for one T[a\ evalu- 
ation in several mock surveys. (Note that in the Monte 
Carlo framework for Will several such evaluations are 
needed in each Monte Carlo iteration.) Using the op- 
timizations that we have presented here, which improve 
existing implementations by several orders of magnitude, 
a fully optimal bispectrum analysis for Planck (^max = 
2000) should easily be feasible. 



IX. SIMULATING NON-GAUSSIAN MAPS 

So far, our emphasis has been on bispectrum esti- 
mation; however, another apphcation of our machinery 
is that it provides a fast algorithm for simulating non- 
Gaussian maps. It should be emphasized from the outset 
that there is no "universal" probability distribution for 
a field whose power spectrum and bispectrum are pre- 
scribed. This is because the four- and higher-point con- 
nected correlation functions must be nonvanishing, for 
the probability density to be positive definite. In gen- 
eral, two schemes for simulating a non-Gaussian field, 
with the same power spectrum and bispectrum, will dif- 
fer in their higher-point amplitudes. However, we expect 
that our algorithm will be useful in the regime of weak 
non-Gaussianity, where higher-point amplitudes can be 
neglected. 

We present a simulation algorithm which generates all- 
sky simulated maps starting from arbitrary input power 
spectrum Ct, and any input bispectrum Bi^i^i^ which 
satisfies the factorizability condition (Eq. ([3])). The 
power spectrum and bispectrum of the field which is 
simulated will satisfy: 

= Cp, + 0{B^) (80) 
^'iii2i3 = Bi^i^i^+ 0{B ) (81) 

where 0{B^) denotes terms containing k or more powers 
of the bispectrum. For N > A, the connected A^-point 
function of the simulated field will satisfy: 

(a£imia^2m2 ■ ■ ■ afivin]v)conn. = 0{B^ (82) 

Under the assumption of weak non-Gaussianity, where 
the extra terms in Eq. ([5T|) can be neglected, the power 
spectrum and bispectrum of the simulated field will agree 
with the input values C^, Bi-^i^i^. The problem of sim- 
ulating non-Gaussian fields has received some attention 
in the literature [1^ [sO, Ull, [13] , but no method has been 
proposed with this generality. 

Our simulation algorithm is given as follows. One first 
simulates a Gaussian random field aim with power spec- 
trum C'i. Then define by perturbing to order 0{B) 
as follows: 

O'im = + -^yemT[Cg,^ai'm']- (83) 

The algorithm given in Wllll is used to evaluate VT. The 
GPU time needed for one random realization of a'f^ is 
therefore given by Tab. |V] (with a factor of two included 
for calculating VT in addition to T), e.g. 168 GPU- 
seconds for the local bispectrum at WMAP3 noise levels 
and £inax = 1000. With this level of performance, non- 
Gaussian simulations can easily be included in a Monte 
Garlo analysis of Planck data, in the generality of an 
arbitrary factorizable bispectrum. 

To lowest (zeroth) order in B, the power spectrum of 
^■^m is Let us calculate the lowest-order contribu- 
tion to the bispectrum of a^^. Plugging in the defintion 



16 




FIG. 7: One random realization obtained using the simula- 
tion algorithm from ijIXI consisting of a Gaussian map {airn} 
(top), and non-Gaussian maps {ofm} (middle), {a^^} (bot- 
tom). To lowest order, the map (afm + /]v£a^m + /^\f^^m) ^^^^ 
have the power spectrum of the fiducial model, and bispec- 
trum which is a linear combination of the local and equilateral 
forms (Eqs. (|12p . (|22|l '). All three maps have been smoothed 
with a 1° Gaussian beam. 

(Eq. ISni)) of VT, 

6 2 S \ I 



(.'^m'^^i' '^I!^m'^'^i2'm2^l3m3 T (Symm.j 

= Be^i^i^ . (85) 

\ mi TO2 / 

where -f (symm.) denotes the sum over five additional 
terms obtained by permuting {ii,mi). This shows that 
the lowest-order bispectrum is simply Bi-^i^^^; it is easy 
to see that the orders of the higher-order terms are as 
claimed in Eq. ([8T|) . 

In Fig. [71 we illustrate the method by showing a sin- 
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FIG. 8: Dependence of the la error o"(//y£) on /sky, for a 
spherical cap shaped survey with noise level 500 /iK-arcmin 
and 10 arcmin beam, showing good agreement between the 
Monte Carlo errors (circles) , and simple f~^y^^ scaling (dotted 
line) . 



gle random realization, split into three pieces. From 
Eq. (1551) ; sach realization is generated as a Gaussian piece 
(the first term on the right-hand side) plus a small non- 
Gaussian perturbation (the second term) which depends 
on the bispectrum. We have shown the Gaussian piece 
separately in Fig. [71 and also shown the non-Gaussian 
term for the case of the local (Eq. ([121) ) and equilateral 
(Eq. (221)) bispectra. 

Let us emphasize the caveats associated with the simu- 
lation algorithm presented here. A model which predicts 
non-Gaussianity at the three-point level will also pre- 
dict higher-point connected correlation functions; these 
will not be reproduced faithfully by the simulations. If 
higher-point correlations are important, then one must 
tailor the simulation method to the specific model; one 
cannot expect to use a "generic" algorithm which only 
incorporates two-point and three-point information, such 
as the algorithm we have given in this section. However, 
in the regime of weak non-Gaussianity, where the three- 
point function is marginal and higher-point correlations 
are negligible, our simulation method should apply. 



X. EXAMPLE SURVEYS 

The general Monte Carlo framework for optimal bis- 
pectrum estimation which has been presented above ap- 
plies to any survey for which a map can be efficiently mul- 
tiplied by C~^. We conclude by considering some mock 
surveys whose sky coverage and noise are azimuthally 
symmetric. The role of azimuthal symmetry is to make 
the noise covariance matrix diagonal in m, so that 
multiplication can be performed quickly by "brute force" 
[22]. 

Our first example will be a survey with homogeneous 



FIG. 9: Dependence of the la error a{f}^'j^) on ^max, for the 
azimuthally symmetric approximation to Planck described in 
ijXl showing an ^^ax dependence throughout the sample vari- 
ance limited regime. 



noise level 500 ^K-arcmin and Gaussian beam ^fwhm = 
10 arcmin, with the geometry of a spherical cap. In 
Fig. m we show la errors cr(/]vi) obtained using the op- 
timal estimation framework from Will for varying /sky 
It is seen that, with optimal estimators and for this sim- 
ple sky cut, cr(/]vL) varies as f~^y^^ over two orders of 
magnitude. 

As a second example, we consider a Planck-like sur- 
vey. To approximate Planck within the constraint of az- 
imuthal symmetry, we include a "galactic" sky cut which 
masks an equatorial band whose size is chosen to give 
/sky = 0.8, and take a pixel-dependent noise variance of 
the form 

a{0,^)= (^^^<jlsm{0) (86) 

where the average noise level cto is determined by the 
sensitivities in Tab. HI (This angular dependence of the 
noise is motivated by the Planck scan strategy, which 
scans along great circles through the ecliptic poles; how- 
ever, note that azimuthal symmetry requires us to place 
the poles perpendicular to the galactic cut.) 

Since it is currently unclear what cutoff in £ will be 
needed in Planck to ensure that foreground contamina- 
tion is sufficiently well-controlled for bispectrum estima- 
tion, we show the dependence of cr(/]^£) on irnax in Fig. [51 
It is seen that, throughout the range of multipoles where 
Planck is sample variance limited {£ < 1500), cr{f}^1) 
varies roughly as €^ax: expected from [sj]. At higher 
£, cr{fj^'i) flattens as instrumental noise becomes a con- 
taminant. At ^inax — 2000, the largest value considered, 
the 1(7 error agrees with the Fisher matrix forecast from 
WII This demonstrates the ability of the optimal es- 
timator to saturate statistical limits on cr(/]^£) in the 
presence of sky cuts and inhomogeneous noise. Another 
feature seen in Fig. [51 is that o-{f}^1) always decreases as 
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^max increases. This is a characteristic of optimal estima- 
tors, which can never worsen as more modes are added. 
In contrast, for non-optimal bispectrum estimators, the 
variance eventually begins to worsen when £niax becomes 
larg e enough that not all modes are signal-dominated 

MM- 



XI. DISCUSSION 

Perhaps the most fundamental problem in connecting 
a predicted shape of the CMB bispectrum with data is 
that analysis techniques which allow an arbitrary angular 
bispectrum Bg-^g^g^ are computationally prohibitive. For 
example, even with the simplifying assumption of all-sky 
homogeneous noise, the cost of evaluating an optimal es- 
timator is C(^max)i "iue to the number of nonzero terms 
in the harmonic-space sum (Eq. (j54p ). In practice, this 
problem has meant that bispectrum estimation to date 
has been limited to a few special forms of the bispec- 
trum, such as the "local" shape (Eq. p^ ). where fast 
estimators are available. On the other hand, there is a 
growing menagerie of theoretically motivated bispectra 
from secondary anisotropics 1^, [11, [13, [13, [l^l and 
early universe physics [3, 4_, ^, 6,, |8| , which one might wish 
to study in a dataset such as Planck. 

We have shown that the factorizability criterion for 
bispectra (Eq. ([3])) is a compromise in generality which 
is specific enough to enable fast algorithms, yet general 
enough to encompass a wide range ( ijllip of predicted 
shapes for Bg^g^g^ . We have given several fast algorithms 
which operate in the generality of an arbitrary factoriz- 
able bispectrum. In each case, the idea behind the al- 
gorithm is that the factorizability condition permits effi- 
cient computation by translating from harmonic to posi- 
tion space. 

The first such algorithm is an optimization algorithm 
which, given an input bispectrum written as a sum of 
many factorizable terms, outputs an "optimized" bis- 
pectrum with fewer terms which closely approximates 
the input. It is feasible to run this algorithm on very 
large input sizes; e.g. the higher-derivative bispectrum 
for Planck (with iVfact = 80000 and = 2000) was 

treated, using a two-level optimization procedure de- 
scribed in fjVj In this example, the optimization algo- 
rithm outputs a bispectrum which has only 86 factoriz- 
able terms, and is provably indistinguishable from the 
input; the algorithm is only allowed to terminate when 
the Fisher distance between the two is < 10~^. 

Second, we have given a general Monte Carlo based 
framework for optimal bispectrum estimation, which re- 
lies on two ingredients: a method for computing the 
quantities {T[a], V£„iT[a]} defined in Eq. ([55]) . and a 
method for multiplying a map a = {agm} by {S 4- N)~^, 
where S and N are signal and noise covariance matri- 
ces. For the first of these, we have given a fast algo- 
rithm which can be thought of as generalizing the KSW 
estimator (constructed in fl^ for the local shape) to 



any factorizable bispectrum, and providing a convenient 
way to compute the linear term in the optimal estimator 
(Eq. (|54p ). Additionally, we have described optimiza- 
tions, such as rewriting the slowest steps as matrix mul- 
tiplications, which improve the running time of existing 
implementations by several orders of magnitude. This 
speedup allows us to use large values of ^max and should 
make a Monte Carlo analysis for Planck very affordable 
(Tab.©. 

Our estimator is fully optimal under the assumption 
that an affordable method can be found for multiplying a 
CMB map by [S + N)~^, where S and N are signal and 
noise covariances respectively. Finding such a method 
will depend on the noise model and is outside the scope 
of this paper, where our emphasis has on algorithmic as- 
pects of bispectrum estimation and simulation which are 
not experiment-specific. If no such method can be found, 
then our estimator is not fully optimal, but still includes 
the linear term, which should improve constraints in the 
presence of inhomogeneous noise (App. [B]). The "C~^ 
problem" is a general ingredient in optimal estimators 
and also arises, e.g. for optimal power spectrum esti- 
mation and for lens reconstruction. We will discuss this 
problem, with emphasis on features of the noise model 
which arise in WMAP, in a forthcoming paper analyzing 
three-year WMAP data. 

Finally, we have given a simulation algorithm which 
generates random sky maps with prescribed power spec- 
trum and factorizable bispectrum. This greatly extends 
the generality of existing methods for simulating non- 
Gaussian fields, and is computationally inexpensive, eas- 
ily permitting non-Gaussian simulations to be included 
in Monte Carlo based pipelines if needed. An important 
caveat is that the higher-point correlation functions are 
not guaranteed to match the model which gives rise to 
the bispectrum; the higher correlations are merely guar- 
anteed to be small (Eq. Therefore, the simulation 
method should only be relied upon in the regime of weak 
non-Gaussianity, where higher correlations are negligible. 

As an application of these techniques, we have done 
a Fisher matrix forecast for multiple bispectra at Planck 
sensitivities ( Wip . Of the bispectra considered, we found 
that four were nondegenerate: the point source, ISW- 
lensing, local, and equilateral shapes. Correlations be- 
tween these four are small, so that the shapes can be in- 
dependently constrained. However, at Planck sensitivity 
levels, the ISW-lensing bispectrum can still significantly 
bias estimates of the local bispectrum, if not marginalized 
in the analysis. We have also demonstrated the optimal 
estimator on example surveys (jQC]). showing that it is 
both computationally affordable and achieves statistical 
limits on cr(/]^£) for a Planck-like survey with inhomo- 
geneous noise and -^max = 2000. 

Our most general conclusion is that the factorizabil- 
ity criterion (Eq. ([3])) is a promising approach for bridg- 
ing the gap between a theoretically motivated shape of 
the bispectrum and data. We have described a generic 
"toolkit" , with algorithms for Fisher forecasting, analy- 
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sis, and simulation, which can be implemented once and 
subsequently applied to any bispectrum which can be 
written in factorizable form. Even if the number of fac- 
torizable terms appears to be intractably large 10^), 
the optimization algorithm (^V| can still be used and 
may reduce the number of terms to a manageable level, 
as in the case of the higher-derivative shape. 
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APPENDIX A: MONTE CARLO AVERAGES 

The purpose of this appendix is to derive 
Eqs. (HZD, dSSD, and §^ in WUi in which Monte 
Carlo expressions are given for the linear term in the 
estimator £, the normalization Fg, and the variance 
Var(f). We note that Monte Carlo averages involving 
(C~^a), where a is a Gaussian random field with 
covariance C, can be evaluated using the contraction: 



(C ^a)i>,mAC ^a)/?2m2 - C'fimi/^m^- (Al) 

It will be convenient to define the following quantities: 



def „ „ / il i2 is \ h 4 4 

\ mi 7712 ITT-S I \ "^4 "^5 TTlQ 

ii mi .£41714 



(A2) 



In terms of these, we evaluate the following Monte Carlo 
averages, using the definition (Eq. ([55)) ) of T: 
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The Monte Carlo expression (Eq. ([57]) ) for E follows im- 
mediately from the definition (Eq. ([M]) ) and Eq. (|A3|) 
above. 

Turning next to the estimator normalization i^g, the 
definition (Eq. ([5^ ) implies: 



p 1 R R / 4 4 4 W 4 4 4 
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Comparing with Eq. (jASp . (jA4p . the Monte Carlo expres- 
sion (Eq. dSS])) for Fg follows. 

Finally, from the definition (Eq. ([M]) '). the estimator 
variance is given by a sum of three terms. 



Var(£[a]) = — {t[C-^ a]T[C~^ a] 



(A6) 



-T[C-^a]{C-^a)tmPtm 

4 la 



which are evaluated as follows: 
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(A9) 



Putting Eqs. ((M|) - ((A9|) together, we get 
completing the derivation of Eq. ^ 



(AlO) 



APPENDIX B: NO C"' 

Our Monte Carlo framework for optimal bispectrum 
estimation ( WIip has one experiment-specific require- 
ment: a method for multiplying a CMB map by C~^. 
In this appendix, we consider the case where this oper- 
ation is impractical. We will see that it is possible to 
preserve one feature of the optimal estimator: improving 
the estimator variance by including the linear term in the 
bispectrum estimator. 

We construct a bispectrum estimator by replacing 
where it appears in the optimal estimator (Eq. (|54p ) by 
some filter F which approximates as well as possible: 



F£> 



T[Fa] - {Fa)eJ\/e,nT[Fa' 



(Bl) 



where we use primes to distinguish this from the opti- 
mal estimator. It can be shown that this choice of linear 
term minimizes the variance, if the cubic term T[Fa\ is 
assumed fixed. (In particular, omitting the linear term 
from the estimator £' defined above always worsens the 
variance.) This estimator only depends on the ability to 



generate simulated signal -I- noise maps (Fa) which are 
filtered in the same way as the data. Any realistic anal- 
ysis pipeline can generate such Monte Carlo simulations. 

By a calculation parallel to App. \^ it can be shown 
the estimator normalization and variance are given by 
the following Monte Carlo averages: 



Ff, - 



-(VT[Fs]FVT[Ci\3i.ra] 



Var(f') = -NT[Fa]FCF'^VT[Fa] 
3 \ 



(B2) 
(B3) 



--(VT[Fa]j {FCF^)(VT[Fa] 



Note that the two are not equal, in contrast to the op- 
timal estimator. In the first expression (Eq. (|B2[) ). the 
Monte Carlo average is taken over signal-only realizations 
s and the C^^ simply refers to division by the signal 
power spectrum, without reference to the noise model. 
In the second expression (Eq. (|B3P ). the average is taken 
over filtered signal -I- noise realizations (Fa) as usual. 

The Monte Carlo prescription given in Eqs. (jB2p . (|B3P 
does require multiplying CMB maps by F and by FCF^ . 
(Note that FCF^ is just the covariance matrix of the 
simulated field [Fa).) These operations are easier than 
multiplying by C~^, but in a "pure" Monte Carlo pipeline 
in which the only possible operation is generating simu- 
lations, one can always fall back on direct Monte Carlo 
evaluations of £' to compute the normalization and vari- 
ance. (Note that computing the estimator normaliza- 
tion in this way requires non-Gaussian simulations, us- 
ing the algorithm from ^11X1 ) As discussed in WII CI 
direct Monte Carlo will be slower than a scheme such as 
Eqs. (Ea), (lB3l) . 



[1] J. M. Maldacena, JHEP 05, 013 (2003), astro- 
ph/0210603. 

[2] V. Acquaviva, N. Bartolo, S. Matarrese, and A. Riotto, 

Nucl. Phys. B667, 119 (2003), astro-ph/0209156. 
[3] G. I. Rigopoulos, E. P. S. Shellard, and B. W. van Tent, 

Phys. Rev. D73, 083522 (2006), astro-ph/0506704. 
[4] G. I. Rigopoulos, E. P. S. Shellard, and B. W. van Tent 

(2005), astro-ph/0511041. 
[5] X. Chen, M.-x. Huang, S. Kachru, and G. Shin (2006), 

hep-th/0605045. 
[6] X. Chen, R. Easther, and E. A. Lim (2006), astro- 

ph/0611645. 

[7] P. Creminelli, JCAP 0310, 003 (2003), astro- 
ph/0306122. 

[8] M. Alishahiha, E. Silverstein, and D. Tong, Phys. Rev. 

D70, 123505 (2004), hep-th/0404084. 
[9] M. Zaldarriaga, Phys. Rev. D69, 043508 (2004), astro- 

ph/0306006. 

[10] N. Arkani-Hamed, P. Creminelli, S. Mukohyama, 
and M. Zaldarriaga, JCAP 0404, 001 (2004), hep- 
th/0312100. 



[11] P. Creminelli, L. Senatore, and M. Zaldarriaga (2006), 

astro-ph/0606001. 
[12] D. Babich, P. Creminelli, and M. Zaldarriaga, JCAP 

0408, 009 (2004), astro-ph/0405356. 
[13] P. Creminelli and M. Zaldarriaga, JCAP 0410, 006 

(2004), astro-ph/0407059. 
[14] D. N. Spergel and D. M. Goldberg, Phys. Rev. D59, 

103001 (1999), astro-ph/9811252. 

[15] D. M. Goldberg and D. N. Spergel, Phys. Rev. D59, 

103002 (1999), astro-ph/9811251. 

[16] L. Verde and D. N. Spergel, Phys. Rev. D65, 043007 

(2002), astro-ph/0108179. 
[17] A. R. Cooray and W. Hu, Astrophys. J. 534, 533 (2000), 

astro-ph/9910397. 
[18] P. Creminelli, A. Nicolis, L. Senatore, M. Tegmark, 

and M. Zaldarriaga, JCAP 0605, 004 (2006), astro- 

ph/0509029. 

[19] E. Komatsu, D. N. Spergel, and B. D. Wandelt, Astro- 
phys. J. 634, 14 (2005), astro-ph/0305189. 

[20] J. R. Bond, A. H. Jaffe, and L. Knox, Phys. Rev. D57, 
2117 (1998), astro-ph/9708203. 



21 



[21] B. D. Wandelt and F. K. Hansen, Phys. Rev. D67, 

023001 (2003), astro-ph/0106515. 
[22] S. P. Oh, D. N. Spergel, and G. Hinshaw, Astrophys. J. 

510, 551 (1999), astro-ph/9805339. 
[23] B. D. Wandelt, E. Hivon, and K. M. Gorski (2000), astro- 

ph/OOOSlll. 

[24] I. Szapudi, S. Prunet, D. Pogosyan, A. S. Szalay, and 

J. R. Bond (2000), astro-ph/0010256. 
[25] E. Hivon et al. (2001), astro-ph/0105302. 
[26] G. Efstathiou, Mon. Not. Roy. Astron. Soc. 349, 603 

(2004), astro-pli/0307515. 
[27] B. D. Wandelt, D. L. Larson, and A. Lakshminarayanan, 

Phys. Rev. D70, 083511 (2004), astro-ph/0310080. 
[28] J. Jewell, S. Levin, and C. H. Anderson, Astrophys. J. 

609, 1 (2004), astro-ph/0209560. 
[29] E. Komatsu et al., Astrophys. J. Suppl. 148, 119 (2003), 

astro-ph/0302223. 
[30] M. Liguori, S. Matarrese, and L. Moscardini, Astrophys. 

J. 597, 57 (2003), astro-ph/0306248. 
[31] C. R. Contaldi and J. Magueijo, Phys. Rev. D63, 103512 

(2001), astro-ph/0101512. 
[32] G. Rocha, M. P. Hobson, S. Smith, P. Ferreira, and 

A. Challinor, Mon. Not. Roy. Astron. Soc. 357, 1 (2005), 

astro-ph/0406136. 
[33] D. Babich and M. Zaldarriaga, Phys. Rev. D7G, 083005 

(2004), astro-ph/0408455. 
[34] E. Komatsu and D. N. Spergel, Phys. Rev. D63, 063002 

(2001), astro-ph/0005036. 
[35] E. Komatsu (2002), astro-ph/0206039. 
[36] W. Hu, Phys. Rev. D62, 043007 (2000), astro- 

ph/0001303. 

[37] U. Seljak and M. Zaldarriaga, Phys. Rev. D60, 043504 

(1999), astro-ph/9811123. 
[38] W. Hu, Astrophys. J. 529, 12 (2000), astro-ph/9907103. 



[39] L. Toffolatti et al., Mon. Not. Roy. Astron. Soc. 297, 117 

(1998), astro-ph/9711085. 
[40] L.-M. Wang and M. Kamionkowski, Phys. Rev. D61, 

063504 (2000), astro-ph/9907431. 
[41] M. Liguori, F. K. Hansen, E. Komatsu, S. Matarrese, 

and A. Riotto, Phys. Rev. D73, 043505 (2006), astro- 

ph/0509098. 

[42] W. H. Press, B. P. Flannery, S. A. Teukolsky, and 
W. T. Vetterling, Numerical Recipes: The Art of Sci- 
entific Computing (Cambridge University Press, Cam- 
bridge (UK) and New York, 1992), 2nd ed., ISBN 0-521- 
43064-X. 

[43] A. Albrecht et al. (2006), astro-ph/0609591. 

[44] P. Creminelli, L. Senatore, M. Zaldarriaga, and 

M. Tegmark (2006), astro-ph/0610600. 
[45] D. Babich, Phys. Rev. D72, 043003 (2005), astro- 

ph/0503375. 

[46] D. N. Spergel et al. (2006), astro-ph/0603449. 

[47] P. Creminelh and M. Zaldarriaga, Phys. Rev. D70, 
083532 (2004), astro-ph/0405428. 

[48] This form of the bispectrum is problematic as it does 
not satisfy a simple consistency condition resulting from 
causality requirements [l^, • The bispectrum does not 
vanish when one takes the limit of one mode being very 
large compared to the horizon. We will ignore this prob- 
lem here as we are just taking it as an illustrative example 
that has been recently analyzed in detain in the litera- 
ture. We will later find that this shape is very correlated 
with a linear combination of the local and equilateral 
shapes. The correlation with the local shape is probably 
unphysical, originating from the fact that this shape does 
not satisfy the appropriate consistency relation. 



